Delay Mechanism for Unbalanced Read/Write Paths in Domino SRAM Arrays

ABSTRACT

A memory system, e.g., a domino static random access memory (SRAM), includes a plurality of memory cells and a wordline decoder coupled to the memory cells through wordlines. The wordline decoder provides a wordline signal to one or more memory cells over the wordlines to allow access to the memory cell(s) for a read operation or a write operation. Read_w 1  and write_w 1  signals are generated by the wordline decoder based on whether a read or a write operation is to be performed in the next cycle. The wordline decoder includes a buffer having an input for receiving the write_w 1  signal and an output for outputting a delayed version of the write_w 1  signal. The wordline signal is activated by the wordline decoder based on the read_w 1  signal and the delayed write_w 1  signal. This overcomes the “early read” problem in which write performance is degraded due to a fast read path.

CROSS-REFERENCE TO RELATED APPLICATION

This patent application is a continuation application of U.S. patentapplication Ser. No. 11/560,428 (docket no. ROC920060443US1), filed Nov.16, 2006, entitled “DELAY MECHANISM FOR UNBALANCED READ/WRITE PATHS INDOMINO SRAM ARRAYS”, which is hereby incorporated by reference.

BACKGROUND OF THE INVENTION

1. Field of Invention

The present invention relates in general to the digital data processingfield. More particularly, the present invention relates to semiconductormemories within digital data processing systems.

2. Background Art

In the latter half of the twentieth century, there began a phenomenonknown as the information revolution. While the information revolution isa historical development broader in scope than any one event or machine,no single device has come to represent the information revolution morethan the digital electronic computer. The development of computersystems has surely been a revolution. Each year, computer systems growfaster, store more data, and provide more applications to their users.

A modern computer system typically comprises at least one centralprocessing unit (CPU) and supporting hardware, such as communicationsbuses and memory, necessary to store, retrieve and transfer information.It also includes hardware necessary to communicate with the outsideworld, such as input/output controllers or storage controllers, anddevices attached thereto such as keyboards, monitors, tape drives, diskdrives, communication lines coupled to a network, etc. The CPU or CPUsare the heart of the system. They execute the instructions whichcomprise a computer program and direct the operation of the other systemcomponents.

The overall speed of a computer system is typically improved byincreasing parallelism, and specifically, by employing multiple CPUs(also referred to as processors). The modest cost of individualprocessors packaged on integrated circuit chips has made multiprocessorsystems practical, although such multiple processors add more layers ofcomplexity to a system.

From the standpoint of the computer's hardware, most systems operate infundamentally the same manner. Processors are capable of performing verysimple operations, such as arithmetic, logical comparisons, and movementof data from one location to another. But each operation is performedvery quickly. Sophisticated software at multiple levels directs acomputer to perform massive numbers of these simple operations, enablingthe computer to perform complex tasks. What is perceived by the user asa new or improved capability of a computer system is made possible byperforming essentially the same set of very simple operations, usingsoftware having enhanced function, along with faster hardware.

Among such faster hardware is static random access memory (SRAM) whichis typically faster than dynamic random access memory (DRAM).Accordingly, SRAM is frequently used where speed is a primaryconsideration such as in CPU caches and external caches. One type ofSRAM known in the art is high performance domino SRAM. For example, U.S.Pat. No. 5,668,761, entitled “FAST READ DOMINO SRAM”, issued on Sep. 16,1997 to Muhich et al., and assigned to IBM Corporation, discloses a highperformance domino SRAM and is hereby incorporated herein by referencein its entirety.

A domino SRAM combines an SRAM with a dynamic circuit known as a “dominocircuit”. To clarify that dynamic circuits are different than dynamictype memories, such as DRAMs, dynamic circuits are referred to herein asdomino circuits or logic. In general, domino logic is a circuit designtechnique that makes use of dynamic circuits, and has the advantage oflow propagation delay (i.e., these are fast circuits) and smaller area(i.e., due to fewer transistors). In domino logic, dynamic nodes areprecharged during a portion of a clock cycle and conditionallydischarged during another portion of the clock cycle, where thedischarging performs the logic function.

FIG. 1 illustrates a conventional memory system. The memory systemcomprises a wordline decoder, a plurality of semiconductor memory cells,a bitline decoder, and an input/output circuit. In general, a memorysystem typically includes a memory cell array that has a grid ofbitlines and wordlines, with semiconductor memory cells disposed atintersections of the bitlines and wordlines. During operation, thebitlines and wordlines are selectively asserted or negated to enable atleast one of the memory cells to be read or written. The wordlinedecoder is coupled to the memory cells to provide a plurality of decodeddata. Additionally, the bitline decoder is coupled to the memory cellsto communicate data which has been decoded or will be decoded. Theinput/output circuit is coupled to the bitline decoder to communicatedata with the bitline decoder and to determine a value which correspondsto that data.

FIGS. 2A, 2B and 2C illustrate a conventional high performance, lowpower domino SRAM design including multiple local cell groups. As shownin FIG. 2A, each cell group includes multiple SRAM cells 1-N and localtrue and complement bitlines LBLT and LBLC. Each SRAM cell includes apair of inverters that operate together in a loop to store true andcomplement (T and C) data. The local true bitline LBLT and the localcomplement bitline LBLC are connected to each SRAM cell by a pair ofwordline N-channel field effect transistors (NFETs) to respective trueand complement sides of the inverters. A WORDLINE provides the gateinput to the wordline NFETs. A particular WORDLINE is activated, turningon respective wordline NFETs to perform a read or write operation.

As shown in FIG. 2B, the prior art domino SRAM includes multiple localcell groups 1-M. Associated with each local cell group are prechargetrue and complement circuits coupled to the respective local true andcomplement bitlines LBLT and LBLC, write true and write complementcircuits, and a local evaluate circuit. Each of the local evaluatecircuits is coupled to a global bitline labeled 2ND STAGE EVAL and asecond stage inverter that provides output data or is coupled to morestages. A write predriver circuit receiving input data and a writeenable signal provides write true WRITE T and write complement WRITE Csignals to the write true and write complement circuits of each localcell group.

A read occurs when a wordline is activated. Since true and complement (Tand C) data is stored in the SRAM memory cell, either the prechargedhigh true local bitline LBLT will be discharged if a zero was stored onthe true side or the precharged high complement local bitline LBLC willbe discharged if a zero was stored on the complement side. The localbitline, LBLT or LBLC connected to the one side will remain in its highprecharged state. If the true local bitline LBLT was discharged then thezero will propagate through one or more series of domino stageseventually to the output of the SRAM array. If the true local bitlineLBLT was not discharged then no switching through the domino stages willoccur and the precharged value will remain at the SRAM output.

To perform a write operation, the wordline is activated as in a read.Then either the write true WRITE T or write complement WRITE C signal isactivated which pulls either the true or complement local bitline lowvia the respective write true circuit or write complement circuit whilethe other local bitline remains at its precharged level, thus updatingthe SRAM cell.

As shown in FIG. 2C, a wordline decoder includes circuitry that outputsan intermediate output signal OUT to other decode circuitry (not shown)that activates the appropriate precharge and wordline signals. Asmentioned earlier, the wordline signal allows access to the memory cellsfor reads and writes. A read wordline signal READ_WL and a writewordline signal WRITE_WL are generated as outputs of a flip-flop with adata input signal READ_WRITEBAR. The data input signal READ_WRITEBARindicates whether a read operation or a write operation will beperformed in the next cycle of a clock input signal CLOCK. The readwordline signal READ_WL and at least two address bit signals A0 and A1are AND'd together in a decode block. In addition, the write wordlinesignal WRITE_WL and the at least two address bit signals A0 and A1 areAND'd together in the decode block. These two AND outputs are OR'd inthe decode block to produce the intermediate output signal OUT, whichproceeds through the other decode circuitry which ultimately triggersthe rising edge of the precharge and the wordline signals.

FIG. 3 is a timing diagram showing the operation of the prior art dominoSRAM shown in FIGS. 2A, 2B and 2C. Domino SRAM arrays, like dominologic, are governed by the behavior of the precharge cycle. Reads andwrites to the SRAM cells occur during the evaluation phase when theprecharge signal is high. Consequently, the wordline signal WL, which isthe output of the wordline decoder and which allows access to the memorycells for reads and writes, follows the precharge signal closely. Anefficient design will employ as much of the samedecode/precharge/wordline circuitry as possible for both read and writeoperations, but a problem arises when the timing demands of a readoperation and a write operation conflict. For example, a fast read pathrequires early rising precharge signal and wordline signal WL which cancause difficulties during a write operation. That is, if the wordlinesignal WL is high a significant amount of time before arrival of thewrite data, it is as if a read operation had commenced and a bitlinesignal BL (denoted with reference numeral “305” in FIG. 3) may start tofall contrary to what is required by the write data. Once this falloccurs, the bitline signal BL is slow to rise. In order for writeperformance to be efficient, this bitline signal BL must exhibit aprofile that does not prematurely fall. Hence, the “early read” problemdegrades the write performance of the domino SRAM.

Therefore, a need exists for an enhanced mechanism for handlingunbalanced read/write paths in domino SRAM arrays.

SUMMARY OF THE INVENTION

According to the preferred embodiments of the present invention, amemory system, e.g., a domino static random access memory (SRAM),includes a plurality of memory cells and a wordline decoder coupled tothe memory cells through a plurality of wordlines. The wordline decoderprovides a wordline signal to one or more of the memory cells over oneor more of the wordlines to allow access to the one or more memory cellsfor a read operation or a write operation. A read_w1 signal and awrite_w1 signal are generated by the wordline decoder based on whether aread operation or a write operation is to be performed in the nextcycle. The wordline decoder includes a buffer having an input forreceiving the write_w1 signal and an output for outputting a delayedversion of the write_w1 signal. The wordline signal is activated by thewordline decoder based on the read_w1 signal and the delayed write_w1signal. This overcomes the “early read” problem in which writeperformance is degraded due to a fast read path. This solution alsoadvantageously permits the same circuitry (e.g.,decode/precharge/wordline) to be used for both the read operation andthe write operation.

According to another aspect of the preferred embodiments of the presentinvention, the delay applied to the write_w1 signal by the buffer isadjustable to match the timing requirements of the write operation.

The foregoing and other features and advantages of the invention will beapparent from the following more particular description of the preferredembodiments of the invention, as illustrated in the accompanyingdrawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The preferred exemplary embodiments of the present invention willhereinafter be described in conjunction with the appended drawings,where like designations denote like elements.

FIG. 1 is a block diagram illustrating a conventional memory system.

FIG. 2A is a schematic diagram illustrating a local cell group of aconventional high performance, low power domino static random accessmemory (SRAM).

FIG. 2B is a schematic diagram illustrating circuitry of a bitlinedecoder of a conventional high performance, low power domino SRAMincluding multiple local cell groups of FIG. 2A.

FIG. 2C is a block diagram illustrating circuitry of a wordline decoderof the conventional domino SRAM shown in FIGS. 2A and 2B.

FIG. 3 is a timing diagram showing the operation of the conventionaldomino SRAM shown in FIGS. 2A, 2B and 2C.

FIG. 4 is a bock diagram of a computer apparatus in accordance with thepreferred embodiments of the present invention.

FIG. 5 is a block diagram illustrating a memory system in accordancewith the preferred embodiments of the present invention.

FIG. 6 is a block diagram illustrating circuitry of a wordline decoderof a domino SRAM in accordance with the preferred embodiments of thepresent invention.

FIG. 7 is a timing diagram showing the operation of a domino SRAM inaccordance with the preferred embodiments of the present invention.

FIG. 8 is a schematic diagram of an illustrative example of a bufferhaving a fixed delay for the wordline decoder shown in FIG. 6.

FIG. 9 is a schematic diagram of another illustrative example of abuffer having an adjustable delay for the wordline decoder shown in FIG.6.

FIG. 10 is flow diagram illustrating a method for adjusting the delay ofa write path of a domino SRAM in accordance with the preferredembodiments of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

1.0 Overview

In accordance with the preferred embodiments of the present invention, amemory system, e.g., a domino static random access memory (SRAM),includes a plurality of memory cells and a wordline decoder coupled tothe memory cells through a plurality of wordlines. The wordline decoderprovides a wordline signal to one or more of the memory cells over oneor more of the wordlines to allow access to the one or more memory cellsfor a read operation or a write operation. A read_w1 signal and awrite_w1 signal are generated by the wordline decoder based on whether aread operation or a write operation is to be performed in the nextcycle. The wordline decoder includes a buffer having an input forreceiving the write_w1 signal and an output for outputting a delayedversion of the write_w1 signal. The wordline signal is activated by thewordline decoder based on the read_w1 signal and the delayed write_w1signal. This overcomes the “early read” problem in which writeperformance is degraded due to a fast read path. In the preferredembodiments of the present invention, this solution also advantageouslypermits the same circuitry (e.g., decode/precharge/wordline) to be usedfor both the read operation and the write operation.

In accordance with another aspect of the preferred embodiments of thepresent invention, the delay applied to the write_w1 signal by thebuffer is adjustable to match the timing requirements of the writeoperation.

2.0 Detailed Description

A computer system implementation of the preferred embodiments of thepresent invention will now be described with reference to FIG. 4 in thecontext of a particular computer system 400, i.e., an IBM eServeriSeries or System i computer system. However, those skilled in the artwill appreciate that the memory system, method and computer programproduct of the present invention apply equally to any computer system,regardless of whether the computer system is a complicated multi-usercomputing apparatus, a single user workstation, a PC, or an embeddedcontrol system. As shown in FIG. 4, computer system 100 comprises a oneor more processors 401A, 401B, 401C and 401D, a main memory 402, a massstorage interface 404, a display interface 406, a network interface 408,and an I/O device interface 409. These system components areinterconnected through the use of a system bus 410.

FIG. 4 is intended to depict the representative major components ofcomputer system 400 at a high level, it being understood that individualcomponents may have greater complexity than represented in FIG. 4, andthat the number, type and configuration of such components may vary. Forexample, computer system 400 may contain a different number ofprocessors than shown.

Processors 401A, 401B, 401C and 401D (also collectively referred toherein as “processors 401”) process instructions and data from mainmemory 402. Processors 401 temporarily hold instructions and data in acache structure for more rapid access. In the embodiment shown in FIG.4, the cache structure comprises caches 403A, 403B, 403C and 403D (alsocollectively referred to herein as “caches 403”) each associated with arespective one of processors 401A, 401B, 401C and 401D. For example,each of the caches 403 may include a separate internal level oneinstruction cache (L1 I-cache) and level one data cache (L1 D-cache),and level two cache (L2 cache) closely coupled to a respective one ofprocessors 401. However, it should be understood that the cachestructure may be different; that the number of levels and division offunction in the cache may vary; and that the system might in fact haveno cache at all.

Note that certain aspects of the preferred embodiments of the presentinvention may be implemented in hardware, while other aspects may beimplemented in software. For example, the memory system and method ofthe present invention are preferably implemented entirely in hardware,e.g., main memory 402, caches 403, and/or other memory device(s). Otheraspects of the present invention, such as an adjustable delay mechanism420, are preferably implemented at least partially in software.

Main memory 402 in accordance with the preferred embodiments containsdata 416, an operating system 418 and application software, utilitiesand other types of software. Optionally, main memory 402 may alsocontain an adjustable delay mechanism 420, which as discussed in moredetail below with reference to FIG. 10, implements an adjustable delayin a memory system's wordline decoder to match the timing requirementsof a write operation. While the adjustable delay mechanism 420 is shownseparate and discrete from operating system 418 in FIG. 4, the preferredembodiments expressly extend to adjustable delay mechanism 420 beingimplemented within the operating system 418. In addition, adjustabledelay mechanism 420 may be implemented in application software,utilities, or other types of software within the scope of the preferredembodiments.

Computer system 400 utilizes well known virtual addressing mechanismsthat allow the programs of computer system 400 to behave as if they haveaccess to a large, single storage entity instead of access to multiple,smaller storage entities such as main memory 402 and DASD device 412.Therefore, while data 416, operating system 418, and adjustable delaymechanism 420, are shown to reside in main memory 402, those skilled inthe art will recognize that these items are not necessarily allcompletely contained in main memory 402 at the same time. It should alsobe noted that the term “memory” is used herein to generically refer tothe entire virtual memory of the computer system 400.

Data 416 represents any data that serves as input to or output from anyprogram in computer system 400. Operating system 418 is a multitaskingoperating system known in the industry as OS/400 or IBM i5/OS; however,those skilled in the art will appreciate that the spirit and scope ofthe present invention is not limited to any one operating system.

According to the preferred embodiments of the present invention,adjustable delay mechanism 420 provides the functionality forimplementing an adjustable delay in a memory system's wordline decoderto match the timing requirements of a write operation. Adjustable delaymechanism 420, if present, may be pre-programmed, manually programmed,transferred from a recording media (e.g., CD ROM 414), or downloadedover the Internet (e.g., over network 426).

Processors 401 may be constructed from one or more microprocessorsand/or integrated circuits. Processors 401 execute program instructionsstored in main memory 402. Main memory 402 stores programs and data thatmay be accessed by processors 401. When computer system 400 starts up,processors 401 initially execute the program instructions that make upoperating system 418. Operating system 418 is a sophisticated programthat manages the resources of computer system 400. Some of theseresources are processors 401, main memory 402, mass storage interface404, display interface 406, network interface 408, I/O device interface409 and system bus 410.

Although computer system 400 is shown to contain four processors and asingle system bus, those skilled in the art will appreciate that thepresent invention may be practiced using a computer system that has adifferent number of processors and/or multiple buses. In addition, theinterfaces that are used in the preferred embodiments each includeseparate, fully programmed microprocessors that are used to off-loadcompute-intensive processing from processors 401. However, those skilledin the art will appreciate that the present invention applies equally tocomputer systems that simply use I/O adapters to perform similarfunctions.

Mass storage interface 404 is used to connect mass storage devices (suchas a direct access storage device 412) to computer system 400. Onespecific type of direct access storage device 412 is a readable andwritable CD ROM drive, which may store data to and read data from a CDROM 414.

Display interface 406 is used to directly connect one or more displays422 to computer system 400. These displays 422, which may benon-intelligent (i.e., dumb) terminals or fully programmableworkstations, are used to allow system administrators and users (alsoreferred to herein as “operators”) to communicate with computer system400. Note, however, that while display interface 406 is provided tosupport communication with one or more displays 422, computer system 400does not necessarily require a display 422, because all neededinteraction with users and processes may occur via network interface408.

Network interface 408 is used to connect other computer systems and/orworkstations 424 to computer system 400 across a network 426. Thepresent invention applies equally no matter how computer system 400 maybe connected to other computer systems and/or workstations, regardlessof whether the network connection 426 is made using present-day analogand/or digital techniques or via some networking mechanism of thefuture. In addition, many different network protocols can be used toimplement a network. These protocols are specialized computer programsthat allow computers to communicate across network 426. TCP/IP(Transmission Control Protocol/Internet Protocol) is an example of asuitable network protocol.

The I/O device interface 409 provides an interface to any of variousinput/output devices.

At this point, it is important to note that while this embodiment of thepresent invention has been and will be described in the context of afully functional computer system, those skilled in the art willappreciate that the present invention is capable of being distributed asa program product in a variety of forms, and that the present inventionapplies equally regardless of the particular type of signal bearingmedia used to actually carry out the distribution. Examples of suitablesignal bearing media include: recordable type media such as floppy disksand CD ROMs (e.g., CD ROM 414 of FIG. 4), and transmission type mediasuch as digital and analog communications links (e.g., network 426 inFIG. 4).

FIG. 5 is a block diagram illustrating a memory system 500 in accordancewith the preferred embodiments of the present invention. Preferably, thememory system 500 is implemented in a domino static random access memory(SRAM). However, the present invention may be implemented in other typesof memory. In the preferred embodiment of the present invention shown inFIG. 5, the memory system 500 comprises a wordline decoder 505, aplurality of semiconductor memory cells 510, a bitline decoder 515, andan input/output circuit 520. The memory system 500 shown in FIG. 5 issimilar to the conventional memory system shown in FIG. 1 with theexception that, as discussed in more detail below, memory system 500adds a delay mechanism 506 to wordline decoder 505 to handle unbalancedread/write paths.

As is conventional, the semiconductor memory cells are arranged in amemory cell array having a grid of bitlines 525 and wordlines 530, withsemiconductor memory cells disposed at intersections of bitlines 525 andwordlines 530. For example, the semiconductor memory cells may bearranged in local cell groups as shown in FIG. 2A. During operation, thebitlines and wordlines are selectively asserted or negated to enable atleast one of the memory cells to be read or written.

The wordline decoder 505 is coupled to the memory cells 510 to provide aplurality of decoded data, as discussed in more detail below withreference to FIG. 6. As mentioned above, in accordance with thepreferred embodiments of the present invention, wordline decoder 505 isprovided with delay mechanism 506 for handling read and write paths thatare unbalanced with respect to each other.

Additionally, bitline decoder 515 is coupled to the memory cells 510 tocommunicate data which has been decoded or will be decoded. Theinput/output circuit 520 is coupled to bitline decoder 515 tocommunicate data with bitline decoder 515 and to determine a value whichcorresponds to that data. In accordance with the preferred embodimentsof the present invention, the combination of bitline decoder 515 and theinput/output circuit 520 is provided by the conventional circuitry shownin FIG. 2B.

FIG. 6 is a block diagram illustrating circuitry 600 of a wordlinedecoder of a domino SRAM in accordance with the preferred embodiments ofthe present invention. The wordline decoder's circuitry 600 shown inFIG. 6 is similar to that shown in FIG. 2C with the exception that, asdiscussed in more detail below, circuitry 600 adds a buffers 602, 604and 606 to delay the signals in the write path (i.e., a write wordsignal WRITE_WL, an address bit signal A0, and an address bit signalA1). The buffers 602, 604 and 606 shown in FIG. 6 together correspondwith the delay mechanism 506 shown in FIG. 5.

As shown in FIG. 6, in accordance with the preferred embodiments of thepresent invention, circuitry 600 outputs an intermediate output signalOUT to other conventional decode circuitry (not shown) that activatesthe appropriate precharge and wordline signals. As is conventional, aread wordline signal READ_WL and a write wordline signal WRITE_WL aregenerated as outputs of a flip-flop 610 with a data input signalREAD_WRITEBAR. The data input signal READ_WRITEBAR indicates whether aread operation or a write operation will be performed in the next cycleof a clock input signal CLOCK.

The buffer 602 has an input for receiving the write wordline signalWRITE_WL and an output for outputting a delayed write wordline signalWRITE_WL_D, i.e., a delayed version of the write wordline signalWRITE_WL. Hence, the delayed write wordline signal WRITE_WL_D is delayedwith respect to the write wordline signal WRITE_WL, as well as the readwordline signal READ_WL. Similarly, buffer 604 has an input forreceiving the address bit signal A0 and an output for outputting adelayed address bit signal A0_D, i.e., a delayed version of the addressbit signal A0. Likewise, buffer 606 has an input for receiving theaddress bit signal A1 and an output for outputting a delayed address bitsignal A1_D, i.e., a delayed version of the address bit signal A1.Preferably, the delay produced by each of buffers 604 and 606 issubstantially identical to that produced by buffer 602.

Three buffers are shown in FIG. 6 for the purpose of illustration. Thoseskilled in the art will appreciate that a different number of buffersthan shown in FIG. 6 may be utilized within the scope of the presentinvention. For example, the number of buffers utilized may increase ordecrease with the number of address bit signals utilized. Also, thebuffers 602, 604 and 606 may be separate as shown in FIG. 6, or may becombined.

The delay produced by each of buffers 602, 604 and 606 is selected toprovide efficient write performance in a case where the read/write pathsare unbalanced. In the case of a domino SRAM with unbalanced read/writepaths, for example, the delay is selected to prevent the bitline signalfrom prematurely falling during a write operation. This write operationtiming requirement is discussed in more detail below with reference toFIG. 7.

The delay produced by each of buffers 602, 604 and 606 may be fixed, ormay be adjusted based on the write operation timing requirements. Ingeneral, the buffers 602, 604 and 606 may comprise any combination ofelements that produce the desired fixed or adjustable delay. Anembodiment of a buffer that produces a fixed delay in accordance withthe preferred embodiments of the present invention is discussed belowwith reference to FIG. 8. An embodiment of a buffer that produces anadjustable delay in accordance with the preferred embodiments of thepresent invention is discussed below with reference to FIG. 9.

As is conventional, the read wordline signal READ_WL signal, the addressbit signal A0, and the address bit signal A1 are AND'd together in adecode block 620. In addition, the delayed write wordline signalWRITE_WL_D signal, the delayed address bit signal A0_D, and the delayedaddress bit signal A1_D are AND'd together in the decode block 620.These two AND outputs are OR'd in the decode block 620 to produce theintermediate output signal OUT, which proceeds through the other decodecircuitry (not shown) that is well known in the art and which ultimatelytriggers the rising edge of the precharge and the wordline signals.

FIG. 7 is a timing diagram showing the operation of a domino SRAM inaccordance with the preferred embodiments of the present invention. Thewrite operation in the timing diagram of FIG. 7 contrasts with that ofFIG. 3, which is a timing diagram showing the operation of the prior artdomino SRAM shown in FIGS. 2A, 2B and 2C. In FIG. 3, the bitline signalBL prematurely falls during the write operation. This “early read”problem degrades the write performance of the domino SRAM. In order forwrite performance to be efficient, this bitline signal BL must exhibit aprofile that does not fall prematurely. As shown in FIG. 7, the delayprovided by the buffers during the write operation in accordance withthe preferred embodiments of the present invention prevents the bitlinesignal BL (denoted with reference numeral “705” in FIG. 7) from fallingprematurely. In the event of a write operation, the buffers delay (ascompared to the read operation) the rising edge of the precharge andwordline signals by delaying the start of the decode process. Thissolves the “early read” problem and enhances the write performance ofthe domino SRAM.

FIG. 8 is a schematic diagram of an illustrative example of a bufferhaving a fixed delay for the wordline decoder shown in FIG. 6. Thebuffer 800 shown in FIG. 8 corresponds to a fixed delay embodiment ofthe buffer 602, 604 and 608 shown in FIG. 6. As shown in FIG. 8, buffer800 in accordance with the preferred embodiments of the presentinvention includes at least two inverters 802, 804 connected in series.

FIG. 9 is a schematic diagram of another illustrative example of abuffer having an adjustable delay for the wordline decoder shown in FIG.6. The buffer 900 shown in FIG. 9 corresponds to an adjustable delayembodiment of the buffer 602, 604 and 608 shown in FIG. 6. As shown inFIG. 9, buffer 900 in accordance with the preferred embodiments of thepresent invention receives an input signal 902 (e.g., the write wordlinesignal WRITE_WL) which is coupled to one input 904 of a NAND gate 906and one input 908 of a NOR gate 910. The output 912 of the NAND gate 906is coupled to the input 914 of an inverter 916. The output 918 of theinverter 916 is coupled to the other input 920 of the NOR gate 910. Theoutput 922 of the NOR gate 910 is coupled to the input 924 of aninverter 926. The output 930 of the inverter 926 provides the delayedoutput (e.g., the delayed write wordline signal WRITE_WL_D) the delay ofwhich is variable based on a delay lengthening select signal input tothe buffer 900. The other input 932 of NAND gate 906 receives this delaylengthening select signal CHSW, what is commonly referred to as a“safety bit” or “chicken switch” signal.

The series combination of the NOR gate 910 and the inverter 926 forms afirst delay element. The series combination of the NAND gate 906 and theinverter 916 is commonly referred to as a “chicken switch” and forms asecond delay element that is enabled when the delay lengthening selectsignal CHSW is high. Thus, when the delay lengthening select signal CHSWis low, the delay applied to the input signal 902 is merely that of thefirst delay element. On the other hand, when the delay lengtheningselect signal is high, the delay applied to the input signal 902 is thecombination of both the first and second delay elements. In this way,the delay applied to the WRITE_WL and address bit signals can beadjusted to match timing requirements.

Similarly, additional chicken switches can be added to the buffer 900 toenhance the variability of the delay applied to the input signal 902.Chicken switches are well known in the art. For example, U.S. Pat. No.6,833,736 B2, entitled “PULSE GENERATION CIRCUIT”, issued on Dec. 21,2004 to Nakazato et al., and assigned to IBM Corporation, discloses apulse generation circuit that utilizes a chicken switch to adjust thepulse width of an input clock signal and is hereby incorporated hereinby reference in its entirety.

FIG. 10 is flow diagram illustrating a method 1000 for adjusting thedelay of a write path of a domino SRAM in accordance with the preferredembodiments of the present invention. The method 1000 shown in FIG. 10corresponds with the adjustable delay mechanism 420 shown in FIG. 4. Themethod 1000 begins with the determination of write operation timingrequirements (step 1010). Step 1010 may, for example, include thedetermination of whether or not the write performance of one or morememory cells is at or above a threshold level using a first level ofdelay. The method 1000 continues with the generation of an appropriatedelay lengthening select signal (step 1020). Step 1020 may, for example,maintain the delay lengthening select signal at a low level if thememory cells have achieved the desired level of write performance usinga first level of delay, or change the delay lengthening select signal toa high level if the memory cells have not achieved the desired level ofwrite performance using the first level of delay. The method 1000 endswith an adjustment of the delay based on the delay lengthening selectsignal (step 1030).

One skilled in the art will appreciate that many variations are possiblewithin the scope of the present invention. Thus, while the presentinvention has been particularly shown and described with reference topreferred embodiments thereof, it will be understood by those skilled inthe art that changes in form and details may be made therein withoutdeparting from the spirit and scope of the present invention.

1. A memory system, comprising: a plurality of semiconductor memory cells; a wordline decoder coupled to the memory cells through a plurality of wordlines, wherein the wordline decoder provides a wordline signal to at least one of the memory cells over at least one of the wordlines to allow access to the at least one memory cell for a read operation or a write operation, wherein the wordline decoder generates a read_w1 signal based on whether a read operation is to be performed in the next cycle and generates a write_w1 signal based on whether a write operation is to be performed in the next cycle, wherein the wordline decoder includes a buffer having an input for receiving the write_w1 signal and an output for outputting a delayed version of the write_w1 signal, and wherein the wordline decoder activates the wordline signal based on the read_w1 signal and the delayed write_w1 signal.
 2. The memory system as recited in claim 1, wherein the memory system is a domino static random access memory (SRAM) and the memory cells are SRAM cells.
 3. The memory system as recited in claim 1, wherein the buffer includes at least two inverters connected in series.
 4. The memory system as recited in claim 1, wherein the buffer includes a safety bit logic element to lengthen the delay in response to a value of a delay lengthening select signal.
 5. The memory system as recited in claim 1, wherein the wordline decoder further comprises: a first AND gate having inputs for receiving the read_w1 signal, a first address bit signal, and a second address bit signal; a second buffer having an input for receiving the first address bit signal and an output for outputting a delayed version of the first address bit signal; a third buffer having an input for receiving the second address bit signal and an output for outputting a delayed version of the second address signal; a second AND gate having inputs for receiving the delayed write_w1 signal, the delayed first address bit signal, and the delayed second address bit signal; an OR gate having inputs for receiving the output of the first AND gate and the output of the second AND gate.
 6. The memory system as recited in claim 5, wherein the wordline decoder activates the wordline signal based on the output of the OR gate.
 7. The memory system as recited in claim 1, wherein the wordline decoder further comprises a flip-flop latch having a first input for receiving a clock signal and a second input for receiving a read_writebar signal indicative of whether a read operation or a write operation is to be performed in the next clock cycle, as well as a first output for outputting the read_w1 signal and a second output for outputting the write_w1 signal.
 8. A data processing system, comprising: a processor; a memory coupled via a bus to the processor; a domino static random access memory (SRAM) located within one of the processor or the memory, the domino SRAM comprising a plurality of SRAM cells, a wordline decoder coupled to the SRAM cells through a plurality of wordlines, wherein the wordline decoder provides a wordline signal to at least one of the SRAM cells over at least one of the wordlines to allow access to the at least one SRAM cell for a read operation or a write operation, wherein the wordline decoder generates a read_w1 signal based on whether a read operation is to be performed in the next cycle and generates a write_w1 signal based on whether a write operation is to be performed in the next cycle, wherein the wordline decoder includes a buffer having an input for receiving the write_w1 signal and an output for outputting a delayed version of the write_w1 signal, and wherein the wordline decoder activates the wordline signal based on the read_w1 signal and the delayed write_w1 signal.
 9. The data processing system as recited in claim 8, wherein the processor includes a cache, and wherein the domino SRAM is located within the processor's cache.
 10. The data processing system as recited in claim 8, wherein the buffer includes at least two inverters connected in series.
 11. The data processing system as recited in claim 8, wherein the buffer includes a safety bit logic element to lengthen the delay in response to a value of a delay lengthening select signal.
 12. The data processing system as recited in claim 8, wherein the wordline decoder further comprises: a first AND gate having inputs for receiving the read_w1 signal, a first address bit signal, and a second address bit signal; a second buffer having an input for receiving the first address bit signal and an output for outputting a delayed version of the first address bit signal; a third buffer having an input for receiving the second address bit signal and an output for outputting a delayed version of the second address signal; a second AND gate having inputs for receiving the delayed write_w1 signal, the delayed first address bit signal, and the delayed second address bit signal; an OR gate having inputs for receiving the output of the first AND gate and the output of the second AND gate.
 13. The data processing system as recited in claim 12, wherein the wordline decoder activates the wordline signal based on the output of the OR gate.
 14. The data processing system as recited in claim 8, wherein the wordline decoder further comprises a flip-flop latch having a first input for receiving a clock signal and a second input for receiving a read_writebar signal indicative of whether a read operation or a write operation is to be performed in the next clock cycle, as well as a first output for outputting the read_w1 signal and a second output for outputting the write_w1 signal.
 15. A computer program product for implementing a domino static random access memory (SRAM) in a digital computing device having at least one processor, comprising: a plurality of executable instructions provided on computer readable signal bearing media, wherein the executable instructions, when executed by the at least one processor, cause the digital computing device to perform the steps of: (a) generating a read_w1 signal based on whether a read operation is to be performed in the next cycle; (b) generating a write_w1 signal based on whether a write operation is to be performed in the next cycle; (c) determining a write operation timing requirement; (d) generating a delay lengthening select signal based on the determined write operation timing requirement; (e) varying a delay applied to the write_w1 signal in response to a value of the delay lengthening select signal; (f) activating a wordline signal based on the read_w1 signal and the delayed write_w1 signal and providing the wordline signal to at least one SRAM cell to allow access to the at least one SRAM cell for a read operation or a write operation.
 16. The computer program product as recited in claim 15, wherein the signal bearing media comprises recordable media.
 17. The computer program product as recited in claim 15, wherein the signal bearing media comprises transmission media.
 18. A domino static random access memory (SRAM) array, comprising: a plurality of SRAM cells; a wordline decoder coupled to the SRAM cells through a plurality of wordlines, wherein the wordline decoder provides a wordline signal to at least one of the SRAM cells over at least one of the wordlines to allow access to the at least one SRAM cell for a read operation or a write operation, wherein the wordline decoder generates a first signal based on whether a read operation is to be performed in the next cycle and generates a second signal based on whether a write operation is to be performed in the next cycle, wherein the wordline decoder includes a delay mechanism that delays the second signal relative to the first signal, and wherein the wordline decoder activates the wordline signal based on the first signal and the delayed second signal.
 19. A domino SRAM array as recited in claim 18, wherein the delay mechanism outputs the delayed second signal by applying a fixed delay to the second signal.
 20. A domino SRAM array as recited in claim 18, wherein the delay mechanism outputs the delayed second signal by applying an adjustable delay to the second signal. 